perm filename CH5G[206,LSP] blob sn#241026 filedate 1976-09-18 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00010 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	.REQUIRE "SETUP" SOURCE_FILE
C00007 00003	.ss Some facts about the PDP-10.
C00013 00004	.ss Code produced by LISP compilers.
C00014 00005	.BEGIN VERBATIM SELECT 1
C00017 00006		Note  that all three compilers produce the same first line of
C00020 00007	.ss Listings of LCOM0 and LCOM4
C00021 00008	.SKIP TO LINE 1
C00022 00009	.SKIP TO LINE 1
C00035 00010	.SKIP TO LINE 1
C00039 ENDMK
C⊗;
.REQUIRE "SETUP" SOURCE_FILE
.NEXT SECTION
.NEXT SECTION
.sec Compiling in Lisp

.ss Introduction

	Compiling is an important example of symbolic computation and
has received much study.  Much of the study has been devoted
to parsing which is essentially the transformation of an input string
in the source language into an internal form.  The internal form used
depends on the compiler.  Sometimes it is Polish prefix or postfix notation,
sometimes it is list structure, and sometimes it consists of entries in
a collection of tables.

	When internal notation LISP is being compiled, the parsing is
trivial, because the input is S-expressions and the internal form wanted
is list structure, so what parsing there is is done by the ordinary LISP
%3read%1 routine.  Therefore, compilers can be very compact and
transparent in structure, and we can concentrate our attention on the code
generation phase.  This is as it should be, because, in my opinion,
parsing is basically a side issue in compiling, and code generation is the
matter of main scientific interest.

	We shall describe two compilers in this chapter called LCOM0 and
LCOM4 which compile S-expression LISP into machine language for the PDP-10
computer according to the conventions of Stanford LISP 1.6.  For now we
shall take these conventions for granted.  Before describing the
compilers, we must describe these conventions.

	The target language is called LAP for LISP assembly program.  Each
function is compiled separately into a separate LAP program, and these
programs are written onto a disk file.  These files can later be read into
a LISP core image and assembled in place by the LAP assembler which is
part of the LISP runtime routines.  The compiler, on the other hand, is a
separate LISP core image that can be instructed to compile several input
files.  For an input file called <name>, it produces and output file
called <name>.LAP.  All this is specific to the PDP-10 time-sharing system
and fortunately works the same whether the time-sharing system is the
D.E.C. system, the Stanford system, or TENEX.

	The LAP program produced is a list of words; the last word is NIL,
and the first word is a header of the form (LAP %3fname%1 SUBR) where
%3fname%1 is the name of the function compiled.  This header tells the
DSKIN program that it should read S-expressions till it comes to NIL and
then submit what it has collected to the LAP assembler which will assemble
it into core.  The rest of the words are either atoms representing labels
or lists representing single PDP-10 instructions.
.ss Some facts about the PDP-10.

	The following facts about the PDP-10 may be of use:

	The PDP-10 has a 36 bit word and an 18 bit address.  In
instructions and in accumulators used as index registers the address is
found in the right part of the word where the least significant bits in
arithmetic reside.

	There are 16 general registers which serve simultaneously  as
accumulators  (receiving the results of arithmetic operations), index
registers (modifying the nominal addresses of  instructions  to  form
effective addresses), and as the first 16 registers of memory (if the
effective address of  an  instruction  is  less  than  16,  then  the
instruction uses the corresponding general register as its operand.)

	All instructions have the same format and are written for the
LAP assembly program in the form

	(<op name> <accumulator> <address> <index register>).

Thus #(MOVE 1 3 P)# causes accumulator #1# to receive the contents of
a  memory  register whose address is #3+c(P), i.e. 3+<the contents of
general register P>.
If no index register is specified, then the address used is just <address>.
If the op name is followed by #@, then the memory
reference is indirect, i.e. the effective addrress is the contents of
the right half of the word  directly  addressed  by  the  instruction
(modified  by  the index register and indirect bit of that word).  In
the following description of some instructions useful in constructing
the compiler, <ef> denotes the effective address of an instruction.
.BEGIN INDENT 8,30,8
.PREFACE 0
.AT 0 ⊂BREAK⊃
.AT 24 ⊂""⊃
.TURN ON "∂"
.VARIABLE I
.I ← 31
MOVE			∂(I)(ac) ← c(<ef>)
MOVEI			∂(I)c(ac) ← <ef>
MOVEM			∂(I)c(<ef>) ← c(ac)
HLRZ			∂(I)right half of c(ac) ← left half of c(<ef>)
HRRZ			∂(I)right half of c(ac) ← right half of c(<ef>)
∂(16)(These two are used indirectly for #car# and #cdr# respectively.)
ADD			∂(I)c(ac) ← c(ac) + c(<ef>)
SUB			∂(I)c(ac) ← c(ac) - c(<ef>)
JRST			∂(I)go to <ef>
JUMPE			∂(I)if c(ac) %2=%* 0 then go to <ef>
JUMPN			∂(I)if c(ac) %2≠%* 0 then go to <ef>
CAME			∂(I)if c(ac) %2=%* c(<ef>) then <skip next instruction>
CAMN			∂(I)if c(ac) %2≠%* c(<ef>) then <skip next instruction>
PUSH			∂(I)c(c(right half of ac)) ← c(<ef>); the contents
			of each half of ac is increased by one and if
			the contents of the left half is then #0, a stack
			overflow interrupt occurs.  (PUSH P ac) is
			is used in LISP to put c(ac) on the 
			stack.
POP			∂(I)c(<ef>) ←c(c(right half of ac)); the
			contents of each half of ac are then decreased
			by 1.  This may be used for removing the
			top element of the stack.  Thus (POP P 1) puts
			the top element of the stack in accumulator 1.
			The i-th element of the stack is obtained by
			(MOVE@ 1 i P).
POPJ			∂(I)(POPJ P) is used for returning from a subroutine
.END
	These instructions are adequate for compiling basic LISP code
with  the addition of the subroutine calling pseudo-instrucion. (CALL
%3n%* (E <subr) S) is used for calling the LISP subroutine <subr> with  %3n%1
arguments.   The  convention  is that the arguments will be stored in
successive accumulators beginning with accumulator #1, and the result
will  be  returned  in  accumulator  #1.   In particular the functions
#ATOM# and #CONS# are called with #(CALL 1 (E ATOM) S)# and (CALL 2  (E
CONS) S)  respectively,  and  programs produced by you compiler will be
called similarly when their names are referred to.
.ss Code produced by LISP compilers.

	We will discuss two compilers, a simple one called LCOM0 and a
more optimising compiler called LCOM4.  LCOM4 produces about half as many
instructions for a given function as does LCOM0. Besides these, there is
the standard PDP-10 LISP compiler written at M.I.T.  by Richard Greenblatt
and Stuart Nelson and modified by Whitfield Diffie.
.SKIP TO LINE 1
.BEGIN VERBATIM SELECT 1
Examples of output of LCOM0, LCOM4, and the regular Lisp compiler

For the file containing:

(DEFPROP DROP
 (LAMBDA(X)
  (COND ((NULL X) NIL) (T (CONS (LIST (CAR X)) (DROP (CDR X))))))
EXPR)

******************************************************************

LCOM0 procuces the following code:

(LAP DROP SUBR) 
(PUSH P 1) 
(MOVE 1 0 P) 
(PUSH P 1) 
(MOVE 1 0 P) 
(SUB P (C 1 0 1 0)) 
(CALL 1 (E NULL) S) 
(JUMPE 1 G0163) 
(MOVEI 1 0) 
(JRST G0162) 
G0163 
(MOVEI 1 (QUOTE T)) 
(JUMPE 1 G0164) 
(MOVE 1 0 P) 
(PUSH P 1) 
(MOVE 1 0 P) 
(SUB P (C 1 0 1 0)) 
(CALL 1 (E CAR) S) 
(PUSH P 1) 
(MOVE 1 0 P) 
(SUB P (C 1 0 1 0)) 
(CALL 1 (E LIST) S) 
(PUSH P 1) 
(MOVE 1 -1 P) 
(PUSH P 1) 
(MOVE 1 0 P) 
(SUB P (C 1 0 1 0)) 
(CALL 1 (E CDR) S) 
(PUSH P 1) 
(MOVE 1 0 P) 
(SUB P (C 1 0 1 0)) 
(CALL 1 (E DROP) S) 
(PUSH P 1) 
(MOVE 1 -1 P) 
(MOVE 2 0 P) 
(SUB P (C 2 0 2 0)) 
(CALL 2 (E CONS) S) 
(JRST G0162) 
G0164 
G0162 
(SUB P (C 1 0 1 0)) 
(POPJ P) 
NIL

LCOM4 produces the following code:

(LAP DROP SUBR) 
(PUSH P 1) 
(MOVE 1 0 P) 
(JUMPE 1 G0162) 
(HLRZ@ 1 0 P) 
(CALL 1 (E LIST) S) 
(PUSH P 1) 
(HRRZ@ 1 -1 P) 
(CALL 1 (E DROP) S) 
(MOVE 2 1) 
(MOVE 1 0 P) 
(SUB P (C 1 0 1 0)) 
(CALL 2 (E CONS) S) 
G0162 
(SUB P (C 1 0 1 0)) 
(POPJ P) 
NIL

******************************************************************

ILISP COMPILER produces the following code:

(LAP DROP SUBR) 
       (PUSH P 1) 
       (JUMPE 1 TAG1) 
       (HLRZ@ 1 0 P) 
       (CALL 1 (E NCONS) S) 
       (PUSH P 1) 
       (HRRZ@ 1 -1 P) 
       (CALL 1 (E DROP) S) 
       (POP P 2) 
       (CALL 2 (E XCONS) S) 
 TAG1  (SUB P (C 1 0 1 0)) 
       (POPJ P) 
       NIL 

.END
	Note  that all three compilers produce the same first line of
heading.  This is necessary for the LAP assembly program  which  also
requires  the #NIL# at the end as punctuation.  The standard compiler
uses a function called #XCONS# that has  the  same  effect  as  CONS#
except  that  it receives its arguments in the reverse order which is
sometimes convenient.  This requires, of course, that the compiler be
smart  enough  to  decide  where  it  is  more  convenient to put the
arguments of the %3cons%1 function.

	My  two  compilers  are  similar in structure, but the better
one, which is twice as long, uses the following optimisations.

	1. When the argument of CAR or CDR is a variable, it compiles
a  (HLRZ@  1  i P) or (HRRZ@ 1 i P) which gets the result through the
stack without first compiling the argument into an accumulator.

	2. When it has to set up the arguments of a function  in  the
accumulators,  in general, the program must compute the arguments one
at a time and save them on the stack, and then load the  accumulators
from the stack.  However, if one of the arguments is a variable, is a
quoted expression, or can be obtained from a variable by a  chain  of
#cars#  and  #cdrs,  then  it  need not be computed until the time of
loading  accumulators  since  it  can  be  computed  using  only  the
accumulator in which it is wanted.

	3.  The  Diffy  compiler  produces about 10 percent less code
than LCOM4; the main difference seems to be that it sometimes notices
when  it  has  an  expression that it will need later. Sometimes this
feature leads it astray. For example, it may save #%4a %3u%1#
for later use at greater cost than re-computing it.

	It  should  further be noted that quoted S-expressions can be
compiled with the instruction

	(MOVEI 1 (QUOTE α)).

.ss Listings of LCOM0 and LCOM4

	First are listings of both compilers in blackboard notation.
Following these is an an annotated listing of LCOM0 in S-expression
notation, and a listing of LCOM4 in that notation.
.SKIP TO LINE 1
LCOM0:

.REQUIRE "LC0.BB" SOURCE_FILE
.SKIP TO LINE 1
LCOM4:

.REQUIRE "LC4.BB" SOURCE_FILE
.SKIP TO LINE 1
Annotated listing of LCOM0:

.BEGIN VERBATIM SELECT 1
(DEFPROP LC0FNS
 (LC0FNS COMPL
	 COMP
	 PRUP
	 MKPUSH
	 COMPEXP
	 COMPLIS
	 LOADAC
	 COMCOND
	 COMBOOL
	 COMPANDOR)
VALUE)

~COMPL is the user-callable driver.  It is a FEXPR.  It takes as
~   an argument a single file name, which must have no extension.
~   EXPRs on a file called FILNAM will be compiled into LAP and
~   written on the file FILNAM.LAP. Other types of function 
~   definitions and non-definitions are simply copied to output.

(DEFPROP COMPL
 (LAMBDA(FILE)
  (PROG	(Z)
	(EVAL
	 (CONS (QUOTE OUTPUT)
	       (CONS (QUOTE DSK:)
		     (LIST (CONS (CAR FILE) (QUOTE LAP))))))
	(EVAL (CONS (QUOTE INPUT) (CONS (QUOTE DSK:) FILE)))
	(INC T NIL)
   LOOP	(SETQ Z (ERRSET (READ)))
	(COND ((ATOM Z) (GO DONE)))
	(SETQ Z (CAR Z))
	(COND ((OR (EQ (CAR Z) (QUOTE DE))
		   (AND	(EQ (CAR Z) (QUOTE DEFPROP))
			(EQ (CADDDR Z) (QUOTE EXPR))))
	       (PROG (PROG)
		     (SETQ PROG
			   (COND ((EQ (CAR Z) (QUOTE DE))
				  (COMP	(CADR Z)
					(CADDR Z)
					(CADDDR Z)))
				 (T
				  (COMP	(CADR Z)
					(CADR (CADDR Z))
					(CADDR (CADDR Z))))))
		     (OUTC T NIL)
		     (MAPC (FUNCTION PRINT) PROG)
		     (OUTC NIL NIL)
		     (PRINT (LIST (CADR Z) (LENGTH PROG)))))
	      (T (OUTC T NIL) (PRINT Z) (OUTC NIL NIL)))
	(GO LOOP)
   DONE	(OUTC T NIL)
	(OUTC NIL T)
	(INC NIL T)
	(RETURN (QUOTE ENDCOMP))))
FEXPR)

~COMP compiles a single function definition, returning a list of
~   the LAP code corresponding to the definition.  
~   FN is the atomic name of the function being compiled.
~   VARS is the formal parameter list for the function.
~   EXP is the function body.

(DEFPROP COMP
 (LAMBDA(FN VARS EXP)
  ((LAMBDA(N)
    (APPEND (LIST (LIST (QUOTE LAP) FN (QUOTE SUBR)))
	    (MKPUSH N 1)
	    (COMPEXP EXP (MINUS N) (PRUP VARS 1))
	    (LIST
	     (LIST (QUOTE SUB) (QUOTE P) (LIST (QUOTE C) N 0 N 0)))
	    (QUOTE ((POPJ P) NIL))))
   (LENGTH VARS)))
EXPR)

~PRUP returns an A-LIST formed by pairing successive elements of
~   VARS with consecutive integers beginning with N.

(DEFPROP PRUP
 (LAMBDA(VARS N)
  (COND	((NULL VARS) NIL)
	(T (CONS (CONS (CAR VARS) N) (PRUP (CDR VARS) (PLUS N 1))))))
EXPR)

~MKPUSH returns a list of N (PUSH P i) instructions, where i runs
~   from M to M+N-1.  Used to push arguments onto the stack.

(DEFPROP MKPUSH
 (LAMBDA(N M)
  (COND	((LESSP N M) NIL)
	(T
	 (CONS (LIST (QUOTE PUSH) (QUOTE P) M)
	       (MKPUSH N (PLUS M 1))))))
EXPR)

~COMPEXP is the heart of LCOM0.  It determines precisely
~   what an expression is, and compiles appropriate code
~   for it.  It returns a list of that code.
~   EXP is the expression to be compiled.
~   M is minus the number of entries on the stack. When
~      added to a value retrieved from the A-LIST VPR, it
~      can be used to locate a variable on the stack.
~   VPR is an A-LIST, associating variable names with 
~      numbers which, when added to M, give stack offsets.
~   Both M and VPR maintain these definitions throughout.

(DEFPROP COMPEXP
 (LAMBDA(EXP M VPR)
  (COND	((NULL EXP) (QUOTE ((MOVEI 1 0))))          ~NIL

	((EQ EXP T) (QUOTE ((MOVEI 1 (QUOTE T)))))  ~T

	((ATOM EXP)                                 ~variable
	 (LIST
	  (LIST	(QUOTE MOVE)
		1
		(PLUS M (CDR (ASSOC EXP VPR)))
		(QUOTE P))))

	((OR (EQ (CAR EXP) (QUOTE AND))             ~boolean expression
	     (EQ (CAR EXP) (QUOTE OR))
	     (EQ (CAR EXP) (QUOTE NOT)))
	 ((LAMBDA(L1 L2)
	   (APPEND
	    (COMBOOL EXP M L1 NIL VPR)
	    (LIST (QUOTE (MOVEI 1 (QUOTE T)))
		  (LIST (QUOTE JRST) 0 L2)
		  L1
		  (QUOTE (MOVEI 1 0))
		  L2)))
	  (GENSYM)
	  (GENSYM)))

	((EQ (CAR EXP) (QUOTE COND))               ~COND
	 (COMCOND (CDR EXP) M (GENSYM) VPR))

	((EQ (CAR EXP) (QUOTE QUOTE))              ~QUOTE
	 (LIST (LIST (QUOTE MOVEI) 1 EXP)))

	((ATOM (CAR EXP))                          ~function call
	 ((LAMBDA(N)
	   (APPEND
	    (COMPLIS (CDR EXP) M VPR)
	    (LOADAC (DIFFERENCE 1 N) 1)
	    (LIST
	     (LIST (QUOTE SUB) (QUOTE P) (LIST (QUOTE C) N 0 N 0)))
	    (LIST
	     (LIST (QUOTE CALL)
		   N
		   (LIST (QUOTE E) (CAR EXP))
		   (QUOTE S)))))
	  (LENGTH (CDR EXP))))

	((EQ (CAAR EXP) (QUOTE LAMBDA))           ~LAMBDA expression
	 ((LAMBDA(N)
	   (APPEND
	    (COMPLIS (CDR EXP) M VPR)
	    (COMPEXP
	     (CADDAR EXP)
	     (DIFFERENCE M N)
	     (APPEND (PRUP (CADAR EXP) (DIFFERENCE 1 M)) VPR))
	    (LIST
	     (LIST (QUOTE SUB) (QUOTE P) (LIST (QUOTE C) N 0 N 0)))))
	  (LENGTH (CDR EXP))))

	(T NIL)))                                 ~oops
EXPR)

~COMPLIS compiles code to evaluate each expression in a list of
~   expressions and to push those values onto the stack.  It
~   returns a list of that code.  It is used to compile code
~   to evaluate arguments to called functions or LAMBDA expressions.
~   U is a list of expressions.

(DEFPROP COMPLIS
 (LAMBDA(U M VPR)
  (COND	((NULL U) NIL)
	(T
	 (APPEND (COMPEXP (CAR U) M VPR)
		 (QUOTE ((PUSH P 1)))
		 (COMPLIS (CDR U) (DIFFERENCE M 1) VPR)))))
EXPR)

~LOADAC returns a list of (MOVE i j P) instructions, loading
~   consecutive accumulators from the top of the stack.
~   K indexes the accumulator loaded.
~   N indexes the stack offset.

(DEFPROP LOADAC
 (LAMBDA(N K)
  (COND	((GREATERP N 0) NIL)
	(T
	 (CONS (LIST (QUOTE MOVE) K N (QUOTE P))
	       (LOADAC (PLUS N 1) (PLUS K 1))))))
EXPR)

~COMCOND compiles a COND.  
~   U is a list of clauses in the COND.
~   L is a label to be emitted at the end of all code for
~      the COND.

(DEFPROP COMCOND
 (LAMBDA(U M L VPR)
  (COND	((NULL U) (LIST L))
	(T
	 ((LAMBDA(L1)
	   (APPEND (COMBOOL (CAAR U) M L1 NIL VPR)
		   (COMPEXP (CADAR U) M VPR)
		   (LIST (LIST (QUOTE JRST) L) L1)
		   (COMCOND (CDR U) M L VPR)))
	  (GENSYM)))))
EXPR)

~COMBOOL compiles code for a single predicate.  That is, the
~   code generated evaluates the predicate and branches somewhere,
~   depending on the value.
~   P is the predicate.
~   L is a label which represents the branch point.
~   FLG is a flag.  If FLG is NIL, code is to fall thru on non-NIL
~      result and branch to L on NIL result.  If FLG is non-NIL,
~      code is to fall thru on NIL result and branch to L on
~      non-NIL result.

(DEFPROP COMBOOL
 (LAMBDA(P M L FLG VPR)
  (COND	((ATOM P)                                        ~simple variable
	 (APPEND
	  (COMPEXP P M VPR)
	  (LIST
	   (LIST (COND (FLG (QUOTE JUMPN)) (T (QUOTE JUMPE))) 1 L))))

	((EQ (CAR P) (QUOTE AND))                        ~conjunction
	 (COND ((NOT FLG) (COMPANDOR (CDR P) M L NIL VPR))
	       (T
		((LAMBDA(L1)
		  (APPEND (COMPANDOR (CDR P) M L1 NIL VPR)
			  (LIST (LIST (QUOTE JRST) 0 L))
			  (LIST L1)))
		 (GENSYM)))))

	((EQ (CAR P) (QUOTE OR))                         ~disjunction
	 (COND (FLG (COMPANDOR (CDR P) M L T VPR))
	       (T
		((LAMBDA(L1)
		  (APPEND (COMPANDOR (CDR P) M L1 T VPR)
			  (LIST (LIST (QUOTE JRST) 0 L))
			  (LIST L1)))
		 (GENSYM)))))

	((EQ (CAR P) (QUOTE NOT))                        ~negation
	 (COMBOOL (CADR P) M L (NOT FLG) VPR))

	(T                                               ~other expression
	 (APPEND
	  (COMPEXP P M VPR)
	  (LIST
	   (LIST (COND (FLG (QUOTE JUMPN)) (T (QUOTE JUMPE)))
		 1
		 L))))))
EXPR)

~COMPANDOR compiles code for lists of predicates connected 
~   conjunctively or disjunctively.
~   U is a list of predicates.
~   L is a label.
~   FLG is a flag.  If FLG is NIL, we are to fall thru on non-NIL
~      results and branch to L on NIL results (AND case).  If FLG
~      is non-NIL, we are to fall thru on NIL results and branch
~      to L on non-NIL results (OR case).

(DEFPROP COMPANDOR
 (LAMBDA(U M L FLG VPR)
  (COND	((NULL U) NIL)
	(T
	 (APPEND (COMBOOL (CAR U) M L FLG VPR)
		 (COMPANDOR (CDR U) M L FLG VPR)))))
EXPR)
.END
.SKIP TO LINE 1
LCOM4:
.BEGIN VERBATIM SELECT 1
.REQUIRE "LCOM4" SOURCE_FILE
.END